## ── Attaching packages ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.2     ✓ purrr   0.3.4
## ✓ tibble  3.0.3     ✓ dplyr   1.0.1
## ✓ tidyr   1.1.1     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.5.0
## ── Conflicts ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
## 
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
## 
##     group_rows
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
## Google's Terms of Service: https://cloud.google.com/maps-platform/terms/.
## Please cite ggmap if you use it! See citation("ggmap") for details.
## 
## Attaching package: 'ggmap'
## The following object is masked from 'package:plotly':
## 
##     wind
# Get the Data

individuals <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-06-23/individuals.csv')
locations <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-06-23/locations.csv')

# Or read in with tidytuesdayR package (https://github.com/thebioengineer/tidytuesdayR)

# Either ISO-8601 date or year/week works!

# Install via devtools::install_github("thebioengineer/tidytuesdayR")

tuesdata <- tidytuesdayR::tt_load('2020-06-23')
## 
##  Downloading file 1 of 2: `locations.csv`
##  Downloading file 2 of 2: `individuals.csv`
tuesdata <- tidytuesdayR::tt_load(2020, week = 26)
## 
##  Downloading file 1 of 2: `locations.csv`
##  Downloading file 2 of 2: `individuals.csv`
individuals <- tuesdata$individuals

This assignment is for ETC5521 Assignment 1 by Team taipan comprising of Helen Evangelina and Yiwen Jiang.

Introduction and motivation

This report presents the findings of the woodland caribou between 1988 to 2016 following the tracking data conducted under B.C. Ministry of Environment & Climate Change. This report mainly analyses the changes in the number of woodland caribou, and other analyses include the habitats changes caused by seasonal differences, the effects of the implementation of management plans and the causes of tag deployment ended. In the following section, we will describe the data set, where the data came from, and what is the data prepared for. The data description also includes how we transform and clean the raw data for analysis. Our statistical programming used for analysis is R and Rstudio.

Motivation

Caribou are the only large herbivore that is widely distributed in the high-elevation habitat and act as agents for plant and lichen diversity through the mechanisms of trampling and foraging. The Caribou has also been a significant resource for indigenous peoples for millennia. (BC Ministry of Environment (2014)). The survival rate of the Caribou is generally relatively low due to predation by Canis Lupus (wolf). The Caribou listed as “vulnerable” on the International Union for Conservation of Nature (IUCN) Red List. With the Caribou being listed as “Threatened”, it is essential to monitor the number of the Caribou as monitoring is vital to effective conservation. We will represent our findings in this report through the exploration of the Caribou tracking data.

Data Sources

The tracking data was collected by B.C. Ministry of Environment & Climate Change over 28 years (1988 - 2016), the data was prepared for the study of management and recovery of the caribou. It includes the information of 286 Caribou and covered 250,000 locations.

Data Limitation

vis_miss(individuals)

  • After we read the data, we observe that in the individuals data over half of the values are missing (Refer to Figure 1). Most of the element cannot be analysed because of large proportion NA in most of the variables, such as, in the pregnant variable, there are 93.36% of the values missing.

  • The Caribou has a low reproductive rate due to females only have one calf per year, and females do not reproduce until they are two years old. To analysis, the sex ratio should be a good indicator of the trend of the number of Caribou. However, there are only five males Caribou out of 286. The analysis result will exist bias when we use the sex ratio as an indicator.

Data description

Overview of the dataset

The dataset tracks woodland caribou in northern British Columbia published by the Movebank Data Repository at https://www.datarepository.movebank.org/handle/10255/move.955. This data was collected by putting trackers of almost 250,000 location tags on 260 caribou, from 1988 to 2016, which was accessed through Movebank.

The boreal woodland caribou (Rangifer tarandus caribou), also known as woodland caribou, boreal forest caribou and forest-dwelling caribou, is a North American subspecies of the reindeer (or the caribou in North America) with the vast majority of animals in Canada. They prefer lichen-rich mature forests and mainly live in marshes, bogs, lakes and river regions. Caribou are considered as an ancient member of the deer family Cervidae (Banfield, 1974). They are smaller than Moose (Alces americanus) and Elk (Cervus canadensis), standing 1.0–1.2 m high at the shoulder (Thomas and Gray, 2002). Due to the caribou is classified as “Vulnerable” on the International Union for the Conservation of Nature’s (IUCN) Red List. The data provided for the study of the B.C. Ministry of Environment & Climate Change to report the management and recovery of the caribou.

Because this data set is used for analysing the reproduction of species, the data is obtained by observation rather than experiment. There is no treatment group and the control group. The time frame of collection was started from 1988 and end of 2016. Movebank captures the locations of individual animals over time by tracking the bio-logging sensors attached to animals (Kranstauber et al., 2011). The data sets were separated into two data files and provided by .csv format. The following are the variables in each data.

  • The individual data comes from Mountain caribou in British Columbia-reference-data.csv. The data contains the relevant information of 286 caribou. The variables are showing in the following table:
Variable Class Description
animal_id character Individual identifier for animal
sex character Sex of animal
life_stage character Age class (in years) at beginning of deployment
pregnant logical Whether animal was pregnant at beginning of deployment
with_calf logical Whether animal had a calf at time of deployment
death_cause character Cause of death
study_site character Deployment site or colony, or a location-related group such as the herd or pack name
deploy_on_longitude double Longitude where animal was released at beginning of deployment
deploy_on_latitude double Latitude where animal was released at beginning of deployment
deploy_on_comments character Additional information about tag deployment
deploy_off_longitude double Longitude where deployment ended
deploy_off_latitude double Latitude where deployment ended
deploy_off_type character Classification of tag deployment end (see table below for full description
deploy_off_comments character Additional information about tag deployment end
  • The location comes from Mountain caribou in British Columbia-gps.csv. The data contains location information of each counted caribous for every 4 fours.
Variable Class Description
event_id double Identifier for an individual measurement
animal_id character Individual identifier for animal
study_site character Deployment site or colony, or a location-related group such as the herd or pack name
season character Season (Summer/Winter) at time of measurement
timestamp datetime Date and time of measurement
longitude double Longitude of measurement
latitude double Latitude of measurement

Data cleaning processes

The data being used is the dataset from the Science update for the South Peace Northern Caribou (Rangifer tarandus caribou pop. 15) in British Columbia available from Movebank (BC Ministry of Environment, 2014). The raw datasets are first read by using read_csv() function. It can be noticed from the raw datasets that the variable names use “-“ instead of “_”. Using dash in a variable name might result to issues, as the valid variable name in R should consist of dot or underline characters. Another problem from this dataset is the values in the “animal-life-stage” consist of spacing, which might lead to issues as it is inconsistent. Another noticeable thing is the datasets have a lot of NA values. Therefore, the data needs to be cleaned by using the tidyverse and janitor libraries.

To clean the individuals data, firstly clean_names() function from the janitor package is used to return the data.frame with clean names. What this function does is changing the variable names into a tidier form. As mentioned before, using dash in variable names is not appropriate in R. Notice that the raw dataset has names like “deploy-off-latitude” which is changed into “deploy_off_latitude”. Next is to assigned the result to transmute(), which will compute new columns but will drop existing columns. This is done to make the variable names in a tidier way. The whitespace in the life stage is gotten rid to address inconsistent spacing by using str_remove_all() function. After tidying the variable names with transmute, the “reproductive_condition” variable is separated into “pregnant” and “with_calf” by using the separate() function as this variable actually contains two dimensions, and then assigning those variables into new columns by using the mutate() function which consists of either TRUE or FALSE value.

The locations data is cleaned by using the same method as the individuals data, which includes cleaning the name first by using clean_names() function to arrive to a data.frame with clean names. The next step is to use transmute() function to compute new columns with dropping existing columns. After cleaning both datasets, the final datasets are written into csv format by using write_csv() function.

# Load libraries
library(tidyverse)
library(janitor)

# Import data
individuals_raw <- read_csv("./caribou-location-tracking/raw/Mountain caribou in British Columbia-reference-data.csv")
locations_raw <- read_csv("./caribou-location-tracking/raw/Mountain caribou in British Columbia-gps.csv")

# Clean individuals
individuals <- individuals_raw %>%
  clean_names() %>%
  transmute(animal_id,
            sex = animal_sex,
            # Getting rid of whitespace to address inconsistent spacing
            # NOTE: life stage is as of the beginning of deployment
            life_stage = str_remove_all(animal_life_stage, " "),
            reproductive_condition = animal_reproductive_condition,
            # Cause of death "cod" is embedded in a comment field
            death_cause = str_remove(animal_death_comments, ".*cod "),
            study_site,
            deploy_on_longitude,
            deploy_on_latitude,
            # Renaming to maintain consistency "deploy_on_FIELD" and "deploy_off_FIELD"
            deploy_on_comments = deployment_comments,
            deploy_off_longitude,
            deploy_off_latitude,
            deploy_off_type = deployment_end_type,
            deploy_off_comments = deployment_end_comments) %>%
  # reproductive_condition actually has two dimensions
  separate(reproductive_condition, into = c("pregnant", "with_calf"), sep = ";", fill = "left") %>%
  mutate(pregnant = str_remove(pregnant, "pregnant: ?"),
         with_calf = str_remove(with_calf, "with calf: ?")) %>%
  # TRUE and FALSE are indicated by Yes/No or Y/N
  mutate_at(vars(pregnant:with_calf), ~ case_when(str_detect(., "Y") ~ TRUE,
                                                   str_detect(., "N") ~ FALSE,
                                                   TRUE ~ NA))

# Clean locations
locations <- locations_raw %>%
  clean_names() %>%
  transmute(event_id,
            animal_id = individual_local_identifier,
            study_site = comments,
            season = study_specific_measurement,
            timestamp,
            longitude = location_long,
            latitude = location_lat)

# Write to CSV
write_csv(individuals, "./caribou-location-tracking/individuals.csv")
write_csv(locations, "./caribou-location-tracking/locations.csv")

Possible questions

This dataset is primarily used to analyse the changes in the number of caribou from 1988 to 2016 to observe the survival of the species. As the management came up with a plan, we would like to analyse whether the management plan is effective in increasing the number of caribou over time.

The primary question to answer from this dataset is how is the trend of the number of caribou over time?

From the primary question, we came up with four secondary questions, which are as follows: - Does the habitats vary between summer and winter?
- How is the trend of the classification of tag deployment end (deploy_off_type)?
- Has the management plan increased the number of caribou?

Analysis and findings

caribou_trend <- locations %>% 
  separate(timestamp, c("date", "time"), sep = " ") %>% 
  mutate(month = month(date), year = year(date)) %>%
  group_by(animal_id, study_site, month, year) %>% 
  summarise(n = 1) %>% 
  group_by(month, year) %>% 
  summarise(count = sum(n)) %>%
  mutate(date = as.Date(paste(year, as.numeric(month), "01",  sep="-"), 
                   format = "%Y-%m-%d"))
## `summarise()` regrouping output by 'animal_id', 'study_site', 'month' (override with `.groups` argument)
## `summarise()` regrouping output by 'month' (override with `.groups` argument)
trend_plot <- ggplot(caribou_trend, aes(x = date, y = count)) +
  geom_line() +
  xlab("") +
  ylab("Number of Caribou been tracked") +
  theme_bw()

ggplotly(trend_plot)

Monthly number of Caribou been tracked between 1988 to 2016

[FILL] Should include at least one plot or numerical summary for each of your questions, that helps the reader arrive at an answer. You should also write paragraphs describing the methods, summaries and findings.

Question 1: What can we learn about the sex ratio and population change of the caribou?

# get map data
caribou_map <- get_map(location = c(-125, 52.5, -119, 57.6), source = "osm") 
## Source : http://tile.stamen.com/terrain/7/19/38.png
## Source : http://tile.stamen.com/terrain/7/20/38.png
## Source : http://tile.stamen.com/terrain/7/21/38.png
## Source : http://tile.stamen.com/terrain/7/19/39.png
## Source : http://tile.stamen.com/terrain/7/20/39.png
## Source : http://tile.stamen.com/terrain/7/21/39.png
## Source : http://tile.stamen.com/terrain/7/19/40.png
## Source : http://tile.stamen.com/terrain/7/20/40.png
## Source : http://tile.stamen.com/terrain/7/21/40.png
## Source : http://tile.stamen.com/terrain/7/19/41.png
## Source : http://tile.stamen.com/terrain/7/20/41.png
## Source : http://tile.stamen.com/terrain/7/21/41.png
ggmap(caribou_map) +
  geom_point(data = locations,
             aes(x = longitude, y = latitude, color = season),
              alpha = 0.5, size = 0.5) +
  theme_void() +
  theme(legend.position = "none",
        panel.grid = element_blank(),
        axis.title = element_blank(),
        axis.text = element_blank(),
        axis.ticks = element_blank())

ggmap(caribou_map) +
  geom_point(data = locations, 
             aes(longitude, latitude, col = study_site), size = 0.3, alpha = 0.9) +
  gghighlight(unhighlighted_params = list(colour = "#F2EFC7"), use_direct_label = FALSE) +
  palettetown::scale_colour_poke(pokemon = "golbat") +
  guides(colour = guide_legend(title = "Herd", override.aes = list(size = 4))) +
  facet_wrap(~season, strip.position = "bottom") +
  labs(title = "Seasonal differences of habitats (Coloured by herds)") +
  theme_void() 

ggmap(caribou_map) +
  geom_point(data = locations,
             aes(x = longitude, y = latitude, color = season),
              alpha = 0.1, size = 0.5) +
  theme(legend.position = "none",
        panel.grid = element_blank(),
        axis.title = element_blank(),
        axis.text = element_blank(),
        axis.ticks = element_blank()) +
  facet_wrap(study_site~season, ncol = 4)

Question 3: How is the trend of the classification of tag deployment end (deploy_off_type)?

• By analysising the classification of tag deployment end to decided whether the management plan is efficient to increase the number of caribou and whether have to modify the tracking method, such as if there is a large proportion of tag deployment end due to the equipment failure, they have to decided to improve the quality of the tag

Question 4: Has the management plan increased the number of caribou?

• According to the Peace Northern Caribou Plan endorsed in November 2012. The goal is to increase the South Peace Northern Caribou to 1,200 animals within 21 years. By using the data sets we can analysis whether the trend of the number of animals are increasing in the previous years, and also is it possible to achieve the in 2032

class(locations$timestamp)
## [1] "POSIXct" "POSIXt"
locations_wrangled <- locations %>%
  mutate(month = month(timestamp),
         day = day(timestamp),
         date = date(timestamp))

i wanna calculate how many unique animal_id are there per day or per month –> to see the trend of number of caribou (if its increasing or not)

locations_summary <- locations_wrangled %>% 
  group_by(date) %>%
  summarise(animals = unique(animal_id))
## `summarise()` regrouping output by 'date' (override with `.groups` argument)

References